18 research outputs found

    Modern Bioinformatics As A Tool To Understand Genomic And Transcriptopmic Variation In Legumes

    Get PDF
    University of Minnesota Ph.D. dissertation. July 2018. Major: Computer Science. Advisors: Robert Stupar, Chad Myers. 1 computer file (PDF); ix, 126 pages.The research presented here focuses on the deployment of modern bioinformatics to gain a greater understanding of legume genomes and gene functions. While improvement of legume crops still relies on conventional breeding approaches, transgenesis, the introduction of a foreign piece of DNA in a host genome, is becoming increasingly common. Using a transgenic approach, the integration of foreign DNA into the host genome using Agrobacterium-mediated transformation is almost always random and is known to induce mutations at the insertion site, but questions have been raised about the potential for mutagenesis at other loci. While genetic engineering has been widely used for crop improvement, few studies have addressed the genome-wide effects of transgenesis. Chapters two and three of this thesis address this question in the context of Glycine max, a major agricultural crop (soybean). Specifically, chapter two features a reanalysis of data from a previous study that reported a large number of mutations in soybean transgenic plants and describes several factors that led to an overestimation. Chapter three addresses the effects on the genome in a series of soybean plants transformed with CRISPR/Cas9, the most recently developed platform for genome editing. The findings of this work have implications on the frequency and transmission of novel variation resulting from soybean biotechnology. Chapter four focuses on applying transcriptome network analysis for predicting the genes that underlie nodule development variation in the Medicago-Ensifer symbiosis. Co-expression networks were constructed for Medicago truncatula and were integrated with data from genome-wide association analysis to prioritize candidate genes with a high likelihood of causal association with nodule development phenotypes. This approach sheds light on potential new genetic factors underlying an important phenotype, and more broadly, could be applied to understand genomic and phenotypic variation for a wide range of plant species and traits

    Genetic Architecture of Soybean Yield and Agronomic Traits

    Get PDF
    Soybean is the world’s leading source of vegetable protein and demand for its seed continues to grow. Breeders have successfully increased soybean yield, but the genetic architecture of yield and key agronomic traits is poorly understood. We developed a 40-mating soybean nested association mapping (NAM) population of 5,600 inbred lines that were characterized by single nucleotide polymorphism (SNP) markers and six agronomic traits in field trials in 22 environments. Analysis of the yield, agronomic, and SNP data revealed 23 significant marker-trait associations for yield, 19 for maturity, 15 for plant height, 17 for plant lodging, and 29 for seed mass. A higher frequency of estimated positive yield alleles was evident from elite founder parents than from exotic founders, although unique desirable alleles from the exotic group were identified, demonstrating the value of expanding the genetic base of US soybean breeding

    The importance of genotype identity, genetic heterogeneity, and bioinformatic handling for properly assessing genomic variation in transgenic plants

    No full text
    Abstract Background The advent of –omics technologies has enabled the resolution of fine molecular differences among individuals within a species. DNA sequence variations, such as single nucleotide polymorphisms or small deletions, can be tabulated for many kinds of genotype comparisons. However, experimental designs and analytical approaches are replete with ways to overestimate the level of variation present within a given sample. Analytical pipelines that do not apply proper thresholds nor assess reproducibility among samples are susceptible to calling false-positive variants. Furthermore, issues with sample genotype identity or failing to account for heterogeneity in reference genotypes may lead to misinterpretations of standing variants as polymorphisms derived de novo. Results A recent publication that featured the analysis of RNA-sequencing data in three transgenic soybean event series appeared to overestimate the number of sequence variants identified in plants that were exposed to a tissue culture based transformation process. We reanalyzed these data with a stringent set of criteria and demonstrate three different factors that lead to variant overestimation, including issues related to the genetic identity of the background genotype, unaccounted genetic heterogeneity in the reference genome, and insufficient bioinformatics filtering. Conclusions This study serves as a cautionary tale to users of genomic and transcriptomic data that wish to assess the molecular variation attributable to tissue culture and transformation processes. Moreover, accounting for the factors that lead to sequence variant overestimation is equally applicable to samples derived from other germplasm sources, including chemical or irradiation mutagenesis and genome engineering (e.g., CRISPR) processes

    Additional file 1: of The importance of genotype identity, genetic heterogeneity, and bioinformatic handling for properly assessing genomic variation in transgenic plants

    No full text
    Figure S1. Pipeline to identify the background genotype of 764. Figure S2. Quality scores for all polymorphic variants (SNPs and indels) called in the Lambirth et al. [22] study. Figure S3. Number of overlapping polymorphisms in the Lambirth et al. [22] study within each of the 12 sibling families studied. (PPTX 1455 kb

    Additional file 2: of The importance of genotype identity, genetic heterogeneity, and bioinformatic handling for properly assessing genomic variation in transgenic plants

    No full text
    Table S1. SNP calls resulting from the data filtering pipeline shown in Additional file 1: Figure S1, excluding the accession identification steps. The SNPs correspond to the top row in Table 1. Table S2. Indel calls resulting from the data filtering pipeline shown in Additional file 1: Figure S1, excluding the accession identification steps. (XLSX 2307 kb

    Integrating Co-Expression Networks with GWAS to Detect Causal Genes Driving Elemental Accumulation in Maize

    No full text
    Genome wide association studies (GWAS) have identified thousands of loci linked to hundreds of traits in many different species. However, in many cases, the causal genes and and the cellular processes they contribute to, remain unknown. This problem is even more pronounced in non-model species where functional annotations are sparse and there is poor resolution in single nucleotide polymorphism (SNP) boundaries. The vast amounts of data available from high throughput sequencing, such as RNA-Seq, are a tantalizing resource to leverage in identifying potential candidates under GWAS SNPs, though are often underutilized or difficult to interpret. To mitigate these issues, here, we systematically integrate whole genome SNP data with functional information derived from gene co-expression networks using a computational framework called Camoco.<br>Camoco scores interactions among genes near GWAS peaks and establishes significance using a robust bootstrapping model. We demonstrate the precision of our method by simulating GWA studies using Gene Ontology (GO) terms. We then used our method to functionally inter-relate loci identified in a large scale, GWA study characterizing elemental accumulation in maize kernels. Our results demonstrate that simply taking the closest genes to significant GWAS SNPs will often lead to spurious results demonstrating the need for proper functional modeling and bootstrapping. Additionally, when deriving functional information from gene transcriptional networks, the biological context from which the transcription was measured is important. Inclusion of gene expression data from tissues not relevant to the elemental phenotypes collected abolishes the relationships between the co-expression networks and the GWAS SNPs. In the correct biological context, genes linked to GWAS hits for elemental accumulation were more significantly co-expressed than genes within similarly structured GO terms. Our framework provides a method to systematically evaluate the putative functional relationships among GWAS candidate loci as well as to efficiently prioritize gene lists produced from GWA studies

    An Induced Chromosomal Translocation in Soybean Disrupts a KASI Ortholog and Is Associated with a High-Sucrose and Low-Oil Seed Phenotype

    No full text
    Mutagenesis is a useful tool in many crop species to induce heritable genetic variability for trait improvement and gene discovery. In this study, forward screening of a soybean fast neutron (FN) mutant population identified an individual that produced seed with nearly twice the amount of sucrose (8.1% on dry matter basis) and less than half the amount of oil (8.5% on dry matter basis) as compared to wild type. Bulked segregant analysis (BSA), comparative genomic hybridization, and genome resequencing were used to associate the seed composition phenotype with a reciprocal translocation between chromosomes 8 and 13. In a backcross population, the translocation perfectly cosegregated with the seed composition phenotype and exhibited non-Mendelian segregation patterns. We hypothesize that the translocation is responsible for the altered seed composition by disrupting a β-ketoacyl-[acyl carrier protein] synthase 1 (KASI) ortholog. KASI is a core fatty acid synthesis enzyme that is involved in the conversion of sucrose into oil in developing seeds. This finding may lead to new research directions for developing soybean cultivars with modified carbohydrate and oil seed composition

    Additional file 1: Table S1. of Genomic variation and DNA repair associated with soybean transgenesis: a comparison to cultivars and mutagenized plants

    No full text
    Resequenced fast neutron genotypes, all from the forward screen family, Bolon et al. [1]. Table S2. Summary of data type, CGH design, and analysis method for Inter-cultivar, Fast Neutron, and Transgenic genotypic classes. Table S3. Summary of SNP frequencies in a subsample fast neutron and transgenic plants. Table S4. Genotypes and regions used to develop CGH log2 ratio empirical thresholds. Table S5. Genotypes examined by CGH. Table S6. Sequences of PCR primers used for genotyping. (DOCX 48 kb
    corecore